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Abstract 

We determine the minimal experimental resources that ensure a unique 
solution in the estimation of trace-preserving quantum channels with both 
direct and convex optimization methods. A convenient parametrization of 
the constrained set is used to develop a globally converging Newton-type 
algorithm that ensures a physically admissible solution to the problem. 
Numerical simulations are provided to support the results, and indicate 
that the minimal experimental setting is sufficient to guarantee good es- 
timates. 

1 Introduction 

Recent advances and miniaturization in laser technology and electronic de- 
vices, together with some profound results in quantum physics and quantum 
information theory, have generated in the last two decades increasing inter- 
est in the promising field of quantum information engineering. The poten- 
tial of these new technologies have been demonstrated by a number of theo- 
retical and experimental results, including intrinsically-secure quantum cryp- 
tography protocols, proof-of-principle implementation of quantum computing, 
as well as dramatic advances in controlled engineering of molecular dynamics, 
opto-mechanical devices, and many other experimentally available systems. In 
this key role is played by control, estimation and identification prob- 

lems for quantum-mechanical systems [HI |38l |32j [31] [21] . Important contri- 
butions to this multi-disciplinary research effort have been offered by a num- 
ber control scientists, among which we would like to remember Mohammed 
Daleh. A few examples dealing with control and estimation problems are 
[T3[Il[ISlEl[l[IHl[3i[3ni[22[13IMlia[lZl,and many more may be found in 
the surveys [H [19] . 

In the spirit of developing research which is both strongly motivated by emerging 
applications and mathematically rigorous, we consider an identification prob- 
lem arising in the reconstruction of quantum dynamical models from experi- 
mental data. This is a key issue in many quantum information processing tasks 
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[TTl [Sn [551 HO]- For example, a precise knowledge of the behavior of a 
channel to be used for quantum computation or communications is needed in 
order to ensure that optimal encoding/decoding strategies are employed, and 
verify that the noise thresholds for hierarchical error-correction protocols, or for 
effectiveness of quantum key distribution protocols, are met |29l . In many 
cases of interest, for example in communication in free-space [37], channels are 
not stationary and to ensure good performances, repeated and fast estimation 
steps would be needed as a prerequisite for adaptive information encodings. 
Motivated by these potential applications, we here focus on: (i) characterizing 
the minimum experimental setting needed for a consistent estimation of the 
channel; (ii) exploring how a minimal parametrization of the models can be 
exploited to reduce the complexity of the estimation algorithm; and (iii) test- 
ing (numerically) the minimal experimental setting, and compare it to "richer" 
experimental resources (in terms of available probe states and measured ob- 
servables). In doing this, we present a general framework for the estimation of 
physically-admissible trace preserving quantum channels by minimizing a suit- 
able class of (convex) loss functions which contains, as special cases, commonly 
used maximum likelihood (ML) functionals. In the large body of literature re- 
garding channel estimation, or quantum process tomography (see e.g. |3H 128) 
and references therein), the experimental resources are usually assumed to be 
given. Mohseni et al. [53] compare different strategies, but focus on the role of 
having entangled states as an additional resource, while we shall assume there 
is no additional quantum system to work with. The problem we study is closer 
in spirit to that taken in [35] while studying minimal state tomography. 
Our result include the determination of the minimal experimental resources (or 
quorum, in the language of [17j ) for Trace-Preserving (TP) channels estima- 
tion, as part of a thorough theoretical analysis of both the inversion (direct, 
or standard tomography) method and a class of convex methods, including the 
widely-used maximum likelihood approach. The method we propose constrains 
the set of channels of the optimization problem to TP maps from the beginning, 
as opposed to the most common approach that introduces the TP constraint 
through a Lagrange multiplier |31l I28j . This allow for an immediate reduction 
from to d'^ — d^ parameters in estimation problem. Our analysis can also 
be considered as complementary to the one presented in [^, where the TP as- 
sumption is relaxed to include losses. We provide a rigorous presentation of the 
results and we try, whenever possible, to make contact with ideas and meth- 
ods of (classical) system identification. We next exploit the same convenient 
parametrization for TP channels we use in the theoretical analysis for develop- 
ing a Newton-type algorithm with barriers, which ensures convergence in the set 
of physically-admissible maps. Numerical simulations are provided, confirming 
that the standard tomography method quite often fails to provide a positive 
map, and showing that experimental settings richer than the minimal one (in 
terms of input states and observables) do not lead to better performances (fixed 
the total number of available "trials"). 
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2 Quantum channels and x-i"epresentation 



Consider a d-level quantum system with associated Hilbert space H isomorphic 
to C^. The state of the system is described by a density operator, namely by a 
positive, unit-trace matrix 

p e = {pe C'^^'Ip = p^>0, tr(p) = 1}, 

which plays the role of probability distribution in classical probability. A state 
is called pure if it has rank one, and hence it is represented by an orthogonal 
projection matrix on a one-dimensional subspace. Measurable quantities or ob- 
servables are associated with Hermitian matrices X — Xfellfc, with {11^} the 
associated spectral family of orthogonal projections. Their spectrum {xk} rep- 
resents the possible outcomes, and the probability of observing the fcth outcome 
can be computed as ^^(11^) — trljlkp)- 

A quantum channel (in Schrodinger's picture) is a map £ : — ^ 1){H). 
It is well known [211 HI] that a physically admissible quantum channel must 
be linear and Completely Positive (CP), namely it admits an Operator-Sum 
Representation (GSR) 

where Ki e C'^^'^ are called Kraus operators. In order to be Trace Preserving 
(TP), a necessary condition to map states to states, it must also hold that 

where Id is the d x d identity matrix. 

An alternative way to describe a CPTP channel is offered by the x-rspfesentation. 
Each Kraus operator Kj e C'*^'' can be expressed as a linear combination (with 
complex coefficients) of {i^Tn}m=ii being the elementary matrix Ejk, with 
m — {j ~ l)d+ k. Accordingly, the OSR ([T]) can be rewritten as 

£{P)= E Xn^,nFmpFl (3) 
m,n— 1 

where x is the d^ x d^ Hermitian matrix with element Xm.n in position {m,n). 
It easy to see that it must satisfy 

X = > (4) 

and (following from ^) 

d^ 

J2 Xm^nFiFrn^h- (5) 
m,n— 1 



(2) 
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The map £ is completely determined by the matrix x- 

We now introduce an helpful lemma which provides us with a parametriza- 
tion of trace preserving maps, and an easy formula for computing probabilities 
in terms x Q ^'^^ ^ brief review of the partial trace definition and properties, 
see the Appendix [X] 

Lemma 2.1 Let £^ be a CPTP map associated with a given x- Then for any 

Sx{p)^tr2{x{h®P^)). (6) 

Proof. Let us rewrite each Fj as the corresponding elementary matrix , with 
j = {I — l)d + m, k = (n— l)d + p, and relabel Xjk as ximnp accordingly. Hence 
we get 

X = ^ XlmnpEln ^ i?mp, (7) 
l,rn,n.p 

and 

^x(P) ^ 5Z XlmnpElmPEpn- 
l.m,n.p 

We can also expand p = X^rs PrsErsi and substitute it in the above expression. 
Taking into account that Ei^Enp = SmnEip, and defining [xf^lmp = Ximnp, we 
get: 

' ^ ^ PrsX-lmnp-^lm^rs-^pn 

^ ^ PrsX-lrns ^hi ; 

= ^ I ^ PrsXlrris j -^/n 
/,n \ r.s / 

= T.ip'^^^n)Eln, 

1,71 

= tr^ixil <E> p^)) 

where we used the fact that xfn corresponds to the d x d dimensional block 
of X in position (Z,n), and that for every pair of matrices X,Y, we can write 

This leads to a useful expression for the computation of the expectations. 

Corollary 2.1 Let us consider a state p, a projector 11 and a quantum channel 
£ with associated x-f£pi^esentation matrix x- Then 

P£(p)(n) = tr{£{p)n) - trixin^p^)). 



^ These results implicitly relate the x matrix emerging from the basis of elementary ma- 
trices we chose to the Choi matrix = 5Zmn ^rnn £( Emn)\32\. In fact, either by direct 
computation or by confronting formula ^ with its equivalent for the Choi matrix Cs (see 
e.g. 1311 , chapter 2), it is easy to see that = OxO^ , where O is the unique unitary such 

that o{x r)ot = y (gi X [H]. 
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Proof. It suffices to substitute ([6| in p^.p(n) = tr(5(/9)n), and use the identity 
tr(X(g)/)y) =tr(Xtr2(y)). □ 

The TP condition ([5| can also be re-expressed directly in terms of the x 
matrix. 

Corollary 2.2 Let us consider a CP map £^ with associated x-fepi^esentation 
matrix x- Then £-)^ is TP if and only if 

tri{x)=Id- (8) 



Proof. Using the same notation we used in the proof of Lemma 2.1 
re-espress the TP condition ([s]) as: 

^ ^ XlninpT^pn-T^lm ^ ^ \lmlpT^pm ^1 x (x) . 
l,m,n^p l,m,p 



□ 



3 Identification Protocols 

Consider the following setting: a quantum system prepared in a known pure 
state p is fed to an unknown channel £. The system in the output state £{p) 
is then subjected to a projective measurement of an observable: to our aim 
it will be sufficient to consider yes-no measurements associated to orthogonal 
projections 11 = 11^ = 11^. Hence the outcome x is in the set {0, 1}, and can be 
interpreted as a sample of the random variable X which has distribution 

P , _ / Px,p(n), if x = 1 , . 

'^x(-) p-\ i-px.p(n), ifx = o 

where Py_^p{Jl) = tr(£^(p)n) is the probability that the measurement of 11 re- 
turns outcome 1 when the state is £x{p)- 

Assume that the experiment is repeated with a series of known input (pure) 
states {pfc}fc=i, and to each trial the same orthogonal projections {Ilj}^!]^ are 
measured N times, obtaining a series of outcomes {x\^}. We consider the 
sampled frequencies to be our data, namely 

1 ^ 

/.fc^=]^E-f- (10) 

1=1 

The channel identification problem (or as it is referred to in the physics liter- 
ature, the quantum process tomography problem [311 [291 128] ) we are concerned 
with consists in constructing a Kraus map £^ that fits the experimental data (in 
some optimal way), in particular estimating a matrix x satisfying constraints 
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3.1 Necessary and sufficient conditions for identifiability 

It is well known |33l I31j that by imposing linear constraints associated to the 
TP condition |5| , or equivalently ([S]) , one reduces the real degrees of freedom 
of X to — d. This will be made explicit in the following, by parameterizing 
X in a "generalized" Pauli basis (also known as gell-mann matrices, Fano basis 
or coherence vector representation in the case of states OH EI])- Usually the 
trace preserving constraint is not directly included in the standard tomography 
method [28], since in principle it should emerge from the physical properties of 
the channel, or it is imposed through a (nonlinear) Lagrange multiplier in the 
maximum likelihood approach |31j . Here, in order to investigate the minimum 
number of probe (input) states and measured projectors needed to uniquely 
determine Xj it is convenient to include this constraint from the very beginning. 
Doing so, we lose the possibility of exploiting a Cholesky factorization in order 
to impose positive semidefiniteness of x- noentheless, we show in Section |3.5| 
that semidefiniteness of the solution can be imposed algorithmically by using a 
barrier method |12| . 

Consider an orthonormal basis for (P x (P Hermitian matrices of the form 
{aj (g) cr/c}j,fc=o,i....,rf2_i, where ctq = I/Vdid, while {crj}j=i....^rf2_i is a basis for 
the traceless subspaces. We can now write 

X = ^SjkO-j (g) Uk- 
jk 

If we now substitute it into ([s]), we get: 

Id = tri(x) = ^ s.jktv{aj)ak = X! 

jk k 

and hence, since the Uj are linearly independent, we can conclude that sqo = 
1, soj = for j = 1, . . . , — 1. Hence, the free parameters for a TP map (at 
this point not necessarily CP, since we have not imposed the positivity of x yet) 

are d'^ — d^ , as we can write any TP x ^'^ X — d^^Id^ + fclo ^ ^jk'^j ^ '^k, 

or, in a more compact notation, 

d'^-d^ 

x{0) = d-^id2 + J2 ^^Q^' (11) 

by rearranging the double index j, A: in a single £, and defining the corresponding 

Qt = CTj (8) CTfe. 

The X matrices corresponding to TP maps thus form an affine space. Let us 
call it linear part 

Stp = span{crj (g crk}j=i^,,,^d^-i^k=o,...,d^-i- 
It is convenient to define 

B,k = {Tlj - ^I) (g pI (12) 
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and B — span{Bjk}j=i,....M,k=i,....L- Since we have Qi = crj^o ^ Ck, it holds 
that 

tr(g,(n, pD) = tr(g,B,fc)- (i3) 

Let us also introduce the function 

with the component of g(9) in position (j, fc) is defined as 

gjkio) = Px(©,pJn,) - tr(x(^)(n, ® p^)- (i4) 

Proposition 3.1 g is injective if and only if Stp C B. 



Proof. Given (14), we have that 

9jk{Si) - 9A^2) = tr[(x(^i) - x(^2))(n, ® pI)] 

where S{9^ - 9^) = x{h) - xiO^) = Yf=r/ iKi - ^2.()Q/ e Stp- So, we have 
that 

,g(£i) - 5(^2) ^ (^(^1 - ^2), Sjfc) = V J, k. (15) 



Assume 5tp C B : the only clement of Stp for which the r.h.s. of (15) could 
be true is zero. Since by definition S[9i — ^2) = if ^.nd only if £1 = ^2 ; 5 
is injective. On the other hand, assume that Stp % B : therefore there exists 
T ^ e 5tp n ■B-^ such that 

T = ^7£Qf, (r,Bjfc) = ov?,fc. 

But this would mean that 9_ and £ + 7 have the same image g{&)^ and hence g 
is not injective. □ 

This is a central result in our analysis: we anticipate here that g being 
injective is a necessary and sufhcient condition for a "priori identifiability of 
X, and thus for having a unique solution of the problem for both inversion 
(standard process tomography) and convex optimization-based (e.g. maximum 
likelihood) methods, up to some basic assumptions on the cost functional. The 
proof is given in full detail in Section |3.2| and |3.3| 

As a consequence of these facts, we can determine the minimal experimental 
resources^ in terms of input states and measured projectors, needed for faithfully 
reconstructing x from noiseless data {/j'/j.}, where f°f. = p^^piji). In the hght of 
proposition |3.1[ the minimal experimental setting is characterized by a choice 
of {lij.pk} such that Stp = B. Recalling the definition of S, through ([l2|, it is 
immediate to see that Stp = ;B if and only if span{nj — d~^Id\ = spanjcTj, j = 
1, . . . , d^ — l} and span{/9fc} = C''^''. We can summarize this fact as a corollary 
of Proposition |3.1[ 
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Corollary 3.1 g is injective if and only if we have at least (P linearly indepen- 
dent input states {pk}, o-nd d^ — 1 measured {Hj} such that 

span{Ilj — d^^Id} ~ span{<7j,j = 1, . . . , d^—1}. 

We call such a set a minimal experimental setting. Notice that, using the termi- 
nology of |3TJ [17] , the minimal quorum of observables consists of c?^ — 1 properly 
chosen elements. While in most of the literature at least observables are 
considered [SUl HH] , we showed it is in principle possible to spare a measure- 
ment channel at the output. A physically-inspired interpretation for this fact is 
that, since we a priori know, or assume, that the channel is TP, measuring the 
component of the observables along the identity does not provide useful infor- 
mation. This is clearly not true if one relaxes the TP condition, as it has been 
done in [3]: in that case, by the same line of reasoning, linearly independent 
observables are the necessary and sufficient for g to be injective. 

As an example relevant to many experimental situation, consider the qubit 
case, i.e. rf = 2. A minimal set of projector has to span the traceless subspace 
q££.2x2. ^^^-^ choose e.g.: 

Px,y = + P± = ± cr^. (16) 

It is clear that there is an asymmetry between the role of output and inputs: in 
fact, exchanging the number of {IIj} and {pk} can not lead to an injective g. 



3.2 Process Tomography by inversion 

Assume that Stp C B, and that the data {fjk} have been collected. Since fjk 
is an estimate of Px{o),pk (nj), consider the following least mean square problem 



eel 



\\m - /II 



(17) 



where g{ff) and / are the vectors obtained by stacking the gjkiO) and fjk j 
1 . . . L, k = . . . AI, respectively. In view of ( 11 1 and (14) we have that g{9) 
T9 + d-^l where 



T = 



tT{B.jkQe 



(18) 



and 1 is a vector of ones. Notice that the £th column of T is formed with 
the inner products of Qi with each Bjk- Since Stp C B, the Qe are linearly 
independent and the Bjk are the generators of B, then T is full column rank, 
namely has rank d'^ ^ d^. Hence, in principle, one can reconstruct 9 as 



= T*{f-l) 



(19) 
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T^^ being the Moore-Penrose pseudo inverse of T ^T]} . If the experimental 
setting is minimal, the usual inverse suffices. However, as it is well known, 
when computing x this way from real (noisy) data, the positivity character 
is typically lost [3TJ [5] . We better illustrate this fact in Section [4] through 
numerical simulations. 

3.3 Convex methods: general framework 

More robust approaches for the estimation of physically-acceptable x (or equiva- 
lent parametrizations) have been developed, most notably by resorting to Maxi- 
mum Likelihood methods [201 |33j [31] [39] . The optimal channel estimation prob- 
lem can be stated, by using the parametrization for x{Q) = d^^Id'^ + J^i (^iQi 
presented in the previous section, as it follows: consider a set of data {fjk} as 
above, and a cost functional J{d) := ho g[6) where h : M.^^^ — >• M is a suitable 
ftinction which depends on the data {fjk}- We aim to find 

^ = argminJ(^) (20) 

subject to 9 belonging to some constrained set C C M''*^''^. In our case 

C = A+ or C = A+ni, 

withyi+ = {9 I xi0) > 0}, while X = | < tr(x(0(nj(^p^)) < 1, Vj,fc}. The 
second constraint may be used when a cost functional which is not well-defined 
for extremal probabilities, or in order to ensure that the estimated channel 
exhibits some noise in each of the measured directions, as it is expected in 
realistic experimental settings. Since the analysis does not change significantly 
in the two settings, we will not distinguish between them where it is not strictly 
necessary. The following result will be instrumental to prove the existence of a 
unique solution. 

Proposition 3.2 C is a bounded set. 

Proof. Since C C A+, it is sufficient to show that A+ is bounded or, equivalently, 
that a sequence {dj}j>o, with 9j S K'' ""^ , and \\9j\\ — > -|-oo, cannot belong to 
,4+. To this end, it is sufficient to show that, as \\9j\\ +oo, the minimum 
eigenvalue of xidj) tends to — oo so that, for j large enough, 9^ does not satisfy 
condition xi^lj) ^ 0- Notice that the map ^ i— x(^) is affine. Moreover, since 
the QgS are lineraly independent, this map is injective. Accordingly, ||x(^j)ll 
approach infinity as — )■ +oo. Since xidj) is a Hermitian matrix, xidj) has 
an eigenvalue Xj such that |Aj| +oo as ||x(^j)ll — >■ +oo. Recall that x(^j) 
satisfies ([S]) by construction which implies that tr(x(^j)) = d namely the sum 
of its eigenvalues is always equal to d. Thus, there exists an eigenvalue of xidj) 
which approaches — oo as j — >■ -|-oo, which is in contrast with its positivity. So, 
C is bounded. □ 
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Here we focus on the following issue: under which conditions on the exper- 
imental setting (or, mathematically, on the set B defined above) do the opti- 
mization approach have a unique solution? In either of the cases above, C is the 
intersection of convex nonempty sets: in fact, Stp and x ^ Eire convex and 
so must be the corresponding sets of 0, and it is immediate to verify that I is 
convex as well; all of these contain ^ = 0, corresponding to ^^d^ : and hence they 
are non empty. In the light of this, it is possible to derive sufficient conditions 
on J for existence and uniqueness of the minimum in the presence of arbitrary 
constraint set C. Define dCo := dC\ {dC n A+) 

Proposition 3.3 Assume h is continuous and strictly convex on g{C), and 

lim Jie)= lim hon{e) = +oo. (21) 

If Stp C B, then the functional J has a unique minimum point in C. 

Proof. Since h is strictly convex on g(C) and the linear function g, in view of 
Proposition [XT] is injective on C, J is strictly convex on C. So, we only need 
to show that J takes a minimum value on C. In order to do so, it is sufficient 
to show that J is inf-compact, i.e., the image of (— oo,r] under the map 
is a compact set. Existence of the minimum for J then follows from a version 
of Weierstrass theorem since an inf-compact function has closed level sets, and 
is therefore, lower semicontinuous [551 P- 56]. Define £q := ( ... ) G 
Observe that x(^o) ^ d~^Id2 > 0. Moreover, being Ilj (g) rank-one 
orthogonal projections 

tr(x(£o)n, ® pD = ^ Vj,/c. (22) 

Therefore, Oq ^ C and call J{(Iq) — Jq < oo. So, we can restrict the search for a 
minimum point to the image of (— oo, Jq] under J^^. Since C is a bounded set 
by construction, to prove inf-compactness of J it is sufficient to guarantee that 

lim J(e) = +00. 
e^dCo 

□ 

3.4 Maximum Likelihood functionals 
3.4.1 Binomial functional 

Assume a certain set of data {fjk} have been obtained, by repeating N times the 
measurement of each pair {pk, Hj). For technical reasons (strict convexity of the 
ML functional on the optimization set) and experimental considerations (noise 
typically irreversibly affects any state), it is typically assumed that < fjk < 1- 
The probability of obtaining a series of outcomes with Cjk — fjkN ones for the 
pair (j, k) is then 

P^{c,k) = ) tr(xn, ® plp" [1 - tr(xn, ® pDf-^^" (23) 
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so that the overall probability of {cjk}, may be expressed as: Pxii^^jk}) = 
Ylj=i Y[k=i ^xi'^jk)- By adopting the Maximum Likelihood (ML) criterion, once 
fixed the {cjk} describing the recorded data, the optimal estimate x of x is 
given by maximizing Pxii^jk}) with respect to x over a suitable set C. Let 
us consider our parametrization of the TP as in ( |TT| ) . If we assume < 
tr(x(^)(nj < 1, since the logarithm function is monotone, it is equivalent 

(up to a constant emerging from the binomial coefficients) to minimize over 
C = yt+ n X[^the function 

m - -^log^x(«)(fefe}) + El°g( jj. ) 

+ (1 - f,k) log[l - tr(x(^)(n, ^ pI))]. (24) 



Here, h{X) = -J2j,k fjk^og{x.jk) + (1 - /jfc)log(l - a^jfe) with Xjk = [X]jk 
and X e K^^^-^ is strictly convex on M*^^^ because < fjk < 1 by assumption. 
Notice that dCo is the set of ^ e A+ for which there exists at least one pair 
(i,k) such that tr(x(0(n^ «> pf )) = 0, 1. Suppose that tr(x(^)(n:; pD ^ 
as 9 ^ DCq. Therefore, log[tr(x(^)(nj (gi p^ ))] -oo. Since cj j, > by 
assumption, we have that 

lim J(^) = -hm V/,felog[tr(x(^)(n,®p^))] 

p — >-uLo u_ — >oL() 

+(l-/,fc)log[l-tr(x(0(n,®pJ))] 
= -^^^k.^'^^r log[tr(x(^)(n.0p?))] 
= +00. 

In similar way, we obtain the same result from the other case, and the conditions 



for existence and uniqueness of the minimum of Proposition 3.3 are satisfied. 

We now discuss consistency of this method. Let 0° be the "true" parameter 
and X = x(^°) be the corresponding x-matrix of the "true" channel. First 
observe that, once fixed the sample frequencies fj^ (or, equivalently, Cj^), 

J{e) >-J2 fjk iog[/,fc] + (1 - f,k) iog[i - 

SO that if there exists 9£C such that tr[x(^)(nj (E) p-")] — fjk, then such a ^ is 
optimal. Hence, in particular, the (unique) optimal solution corresponding to 
the fjk equal to the "true" probabifities tr[x(Hj(g)p|')] is exactly 9°. On the other 
hand, as the number of experiments N increases, the sample frequencies fjk tend 



If the optimization is constrained to ^-|- n X, we are guaranteed that fjk will tend to be 
positive for a sufficiently large numbers of trials. 
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to the "true" probabilities tr[x(nj (g) pj^)]- Therefore, in view of convexity of 
J and of the continuity of J and its first two derivatives, the corresponding 
optimal solution tends to the "true" parameter 9° . This proves consistency. 

3.4.2 Gaussian functional 

Assume a certain data {fjk} have been obtained. For each pk consider the 
sample vector = [ fik ■ ■ ■ fnk G K*'^, that can be thought as a sam- 
ple of = [ tr(x(ni (gjpl^)) ... iT{x{Ii-M ® pD) ]^ ■ Accordingly, we can 
consider the probabilistic model — p'l^+U.k where ^ A/'(0,I]),S > is 
gaussian noise. This noise model is a good representation of certain experimen- 
tal settings in quantum optics, where the sampled frequencies are obtained with 
high number of counts Cj and the gaussian noise is due to the electronic of the 
measurement devices, typically photodiodes. In our model, we can think that 
to each measured Ilj is associated a different device with noise component Vj. 
Notice that, the noise components are in general correlated. Let I?^ denote the 
device associated to Ilj. Then, I?^ will measure the data /ji,...,/jL- Since 
/, ~ J^{p^ , S), the probability of obtaining the outcomes / is then 

— k — X — k 

P^ifJ ^ ^ :Cxp{-;^(/^ (25) 

so that the overall probability of {fjk} is equal to Pxiifjk}) = Y[k=i -^x^f k^- 
By adopting the ML criterion, given {fjk}, the optimal estimate x of X is 
given by maximizing Pxii fjk} ) with respect to x- Taking into account the 
parametrization xid) ^s in it is equivalent to minimize over C = A+ the 
function 

m -2 log (^y^(27r)*^det(E)P^(,) ({/,,}) 

= hlk-P'xie,)^-'(lk-iie/- (26) 

k=l 



Then, it easy to see that the conditions of Proposition [373| are satisfied. Accord- 
ingly the minimum of J is unique. Also in this case it is possible to show, 
along the same lines used for the previous functional, the consistency of the 
method. 

3.5 A convergent Newton-type algorithm 



In Section 3.4 we have presented two ML functionals and showed the uniqueness 
of their solution. Now, we face the problem of (numerically) finding the solution 
9 minimizing J over the prescribed set. In the following we will refer to the 



binomial functional (24), but the results can be easily extended for the Gaussian 
case. 



12 



Consider J as in (24) and assume that Stp C B. Problem (20), with C = 
A+ n I, is equivalent to minimize J over I with the linear inequality constraint 
xid) > 0. Rewrite the problem, making the inequality constraint implicit in the 
objective 

e^mmJ{0) + I_{0) (27) 

where /_ : M — > M is the indicator function for the non positive semidefinite 
matrices x{§.) 

+0O, elsewhere. 

The basic idea is to approximate the indicator function J_ by the convex func- 
tion 

:=-^logdet(x(0) (29) 

where g > is a parameter that sets the accuracy of the approximation (the ap- 
proximation becomes more accurate as q increases) . Then, we take into account 
the approximated problem 

d" = min GJ0) (30) 

where int (C) denotes the interior of C and the convex function 

G,{e):^qJ{9)~\ogdet{x{9)). (31) 

The solution 9 can be computed employing the following Newton algorithm 
with backtracking stage: 

1. Set the initial condition G int (C). 

2. At each iteration, compute the Newton step 



where 



Ml = -Hg^\/G0^ e R'^ (32) 



Si 



= q 

xtr{QsB,k)-tY[x{9)-^Q 



1 fj k fj k 



- tr[x(^)B,fc] ir[x{0)Bjk] 

dG,{9) 
d9r9, 

1 - fjk . fjk 



[1 - trix{9)B,k)? Mx{0)B,,)] 
y.tv{QrB,kMQsBjk) + t^[x{OT^QrX{Oy^ 
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are the element in position s of gradient (understood as column vector) 
and the element in position (r, s) of the Hessian of Gq both computed at 
i- 

3. Set t'^ — 1, and let t^^^ — /2 until all the following conditions hold: 

0<tr[x(^,+<fA^;)i?,,] < 1 Vj, fc 

Gq{9i + t\M_i) < Gq{9_i) + T^fVGj A^; 

where 7 is a real constant, < 7 < \. 

4. Set =9i+ t^A0i e int (C). 

5. Repeat steps 2, 3 and 4 until the condition || VGe^ || < e is satisfied, where 
e is a (small) tolerance threshold, then set 9^ = di- 

This algorithm converges globally: In the first stage, it converges in a linear 
way, while in the last stage, it does converge quadratically. The proof of these 
facts is postponed to Appendix|B] Then, it is possible to show [12, p. 597] that 

J (9) < Jil") < J [9) + ^. (33) 

Hence, cP /q is the accuracy (with respect to ff) of the solution 9^ found. This 
method, however, works well only setting a moderate accuracy. 

An extension of the previous procedure is given by the Barrier method [121 



p. 569] which solves (271 with a specified accuracy ^ > 0: 

1. Set the initial conditions go > and = [ ... ]^ G int (C). 

2. Centering step: At the fc-th iteration compute (f'' & int (C) by minimiz- 

^Qk — l 

ing Gq^ with starting point 9 using the Newton method previously 
presented. 

3. Set Qk+i = mk- 

,2 

4. Repeat steps 2 and 3 until the condition ^ < ^ is satisfied, then set 
9^t- 

So, at each iteration we compute 9 starting from the previously computed 
point 9 , and then increase qk by a factor /i > 1. The choice of the value of 
the parameters qo and /i is discussed in [TSJ p. 574]. Since the Newton method 
used in the centering step globally converges, the sequence {^'^''}fe>o converges 
to the unique minimum point 9 oi J with accuracy ^. Moreover, the number of 
centering steps required to compute 9 with accuracy ^ starting with qo is equal 

to \^^] + i,mp- 601]. 
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4 Simulation results 



In this section we use the fohowing notation: 



IN method denotes the process tomography by inversion of Section 3.2 



• ML method denotes the ML method, using the functional (24) of Section 
Here, the solution is computed using the Barrier method of Section 
with ^ = 10"^ 



3.4 



3.5 



4.1 Performance comparison 

Here, we want to compare the performance of IN and ML method for the qubit 
case d = 2. Consider a set of CPTP map {xi}]=i randomly generated and the 



minimal setting (16). Once the number of measurements N for each couple 
(pfc,Hj) is fixed, we consider the following comparison procedure: 

• At the ^-th experiment, let {c'^.} be the data corresponding to the map 
Xi- Then, compute the corresponding frequencies /jj, — Cjj^/N 

• From {/jfc} compute the estimates xf^ and xt^^ using IN and ML method 
respectively. 

Compute the relative errors 

eMD - 11^,11 ' eM.(0 - 11^,11 ■ (34) 

When the experiments are completed, compute the mean of the relative 
error 

^ 100 ^ 100 

M/AT = XI ^1^(1), fJ-ML ^ J^YI (35) 
1=1 1=1 

• Count the time that the IN method produces an estimate not positive 
semidefinite. This number is denoted as ]j,F. 

In Figure [T] is depicted the results obtained for different lengths N of measure- 
ments related to {c^-fe}. The mean error norm of ML method is smaller than the 
one corresponding to the IN method, in particular when N is small (typical sit- 
uation in the practice). In addition, more than half of the estimates obtained by 
the IN method are not positive semidefinite, i.e not physically acceptable, even 
when N is sufficient large. Finally, we observe that for both methods the mean 
error decrease as N grows. This fact confirms in the practice their consistency. 
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400 600 800 1000 1200 

Figure 1: Comparison performance IN'Vs ML method 
of measurements for each {pk,Iij), /i is the mean relative error as introduced in 



1400 1600 

N is the total number 



(35), while #F denotes the number of failures of the IN method, i.e. the times 



in which the reconstructed x is not positive. 



4.2 Minimal setting 



Let T/i/,L denote the set of the experimental settings with L input states and 
M observables satisfying Proposition |3.1[ Accordingly the set of the minimal 
experimental settings is Tip-i,<p- Here, we consider the case d — 2. We want 
to compare the performance of the minimal settings in 73,4 with those settings 
that employ more input states and observables. We shall do so by picking a 
test channel, finding a minimal setting that performs well, and comparing its 
performance with a non minimal setting in Tm,l, M > 3, L > A that performs 
well in this set while the total number Nt of trials is fixed. 

Consider the Kraus map ([I]) representing a perturbed amplitude damping 
operation (7 = 0.5) with 



/0.9 



\/0.5 




/0.9 





/(15 



K3 = VOA/2I2, K, 
the x-representation 



/0.1/2a;(j), j — 4,5,6, — x,y,z corresponding to 



X = 



0.95 








0.6364 





0.5 














0.05 





0.6364 








0.5 



We set the total number of trials Nt = 3600. Fixed the set Tm,l M > 3 L > 4, 
we take into account the following procedure: 
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• Set TV = TVt \ (LM). 



• Choose a randomly generated collection {Tm}^£j, T„j e Tm,l- 

• Perform 50 experiments for each T™. At the Z-th experiment we have a 
sample data corresponding to x and T^. From {/j™(0} compute 
the estimate Xm(0 using the ML method and the corresponding error 
norm e,„(0 = \\XnAl) ~ x\\/\\x\\- 

• When the experiments corresponding to T™ are completed, compute the 
mean error norm /i„ = ^ Yd=i em(0- 

• When we have fim for m = 1 ... 100, compute 

Ma/,l = min /x^. 
me{i,...aoo} 

In Figure [21 /Ia/.l is depicted for different values of M and L. As we can see. 



0.08 




Figure 2: /ij\/,L for different values of M and L. 

incrementing the number of input states/observables does not lead to an im- 
provement in the performance index. Analogous results have been observed with 
other choices of test maps and Nt- Finally, in Figure[3]is depicted the true x and 
the averaged estimation xml = ^ I]f=i Xm{l) with m = argmin„g{i_..._ioo} 
for Af = 3 and L 4. 
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A Partial trace 

We here briefly recall the definition and some mathematical facts about the 
partial trace, without reference to its fundamental use in statistical quantum 
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Figure 3: Real and imaginary part of x (top) and the averaged estimation xml 
(bottom). In order to improve readability, the vertical scale of the imaginary 
part has been magnified in order to show the errors are below 0.01. 



theory as the way to compute reduced (marginal) states, since we do not employ 
it to that scope. See e.g. [29l |32] for a comprehensive discussion. 

Consider two finite-dimensional vector spaces V W, with dim V — m, dim W = 
n. Let us denote by Aij the set of complex matrices of dimension j x j. Let 
{Mj} be a basis for Aim, and {Nj} be a basis for Ain, representing linear maps 
on V and W, respectively. Consider Aimn = A4m<E)Ain- it is easy to show that 
the X linearly independent matrices {Mj (E) Nk} form a basis for Aimn, 
where denotes the Kronecker product. Thus, one can express any X e A4mn 
as 

jk 

The partial trace over W is the linear map 

trw : Ai mn ^ Mm 

X ^ trw(X) := ^(c,fetr(iVfc))Mj. 

3 

An analogous definition can be given for the partial trace over V. If the two 
vector spaces have the same dimension, n = m, we will indicate with tri and tr2 
the partial traces over the first and the second spaces, respectively. The partial 
trace can be also implicitly defined (without reference to a specific basis) as the 
only linear function such that for any pair X E Aim, Y G Ain'- 

trw{X®Y)=iT{Y)X. 
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By linearity, this clearly implies 



tT{{A(E)I)B) = tr(yltr2(B)). 



Notice that if X G A4mm we may partition X as an to x m block-matrix 
with block of size n x n. In this way the partial trace with respect over the 
second space may be conveniently expressed as: 



trw(X) = tryv 



tr(Xn) 
tr(X„ii) 



Xlm 
Xmm 

tr(Xi™) 

tr(^mm y 



The partial trace with respect to V, trv(X), is instead the n x n matrix having 
in position j, k the trace of the to x ni matrix formed by selecting only the (j, k) 
element of each of the blocks Xjk- 



B Global convergence of the Newton algorithm 

To prove the convergence of our Newton algorithm we need of the following 
result. 

Proposition B.l Consider a function f : X C M" — > M twice dijferentiable on 
X with Hx the Hessian of f at x. Suppose moreover that f is strongly convex on 
a set D C X , i.e. there exists a constant to > such that > ml for x € D, 
and Hx is Lipschitz continuous on D. Let {xi} ^ D he the sequence generated 
by the Newton algorithm. Under these assumptions, Newton's algorithm with 
backtracking converges globally. More specifically, {x,} decreases in linear way 
for a finite number of steps, and converges in a quadratic way to the minimum 
point after the linear .stage. 

Proof See [H 9.5.3, p. 488]. □ 

We proceed in the following way: Identify a compact set D such that 6_i £ D 
and prove that the Hessian is coercive and Lipschitz continuous on D. We then 
apply Proposition |B . 1 1 in order to prove the convergence. 
Since Oq G int (C) wc consider the set 

D:={ee R''"-"" I Gg{e) < Gg{e„)}. (36) 

The presence of the backtracking stage in the algorithm guarantees that the 
sequence Gq{6Q), Gq{9^), . . . is decreasing. Thus e _D, V/ > 0. 

Proposition B.2 The following facts hold: 
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1. D is a compact set. 

2. Hg is coercive and bounded on D, namely there exist s, S > such that 

si < He < SI yeeD. (37) 

3. Hg is Lipschitz continuous on D. 

Proof. 1) D is contained into the bounded set C. Since Z? is a finite dimensional 
space, it is sufficient to show that 

lim GJ9) = +00. (38) 

Here, we have three kind of boundary: dlD int (A+), int {I)r\dA+ and ^I^^A-^-. 
Notice that, logdet(x(^)) takes finite values on djCi int(yl+). Accordingly, 
taking (21) into account, 

lim GJ0) = q lim J(9) = +oo. (39) 

6-^dXnint(A+) e^ainint(A+) 

Then, int (I) D dA+ is the set of 6 for which J is bounded and there exists at 
least one eigenvalue of x(^) equal to zero. Thus, 

lim GJ9)^~ lim log det(x(0) = +oo. (40) 
e-f int(i)naA+ e-i.int(i)naA+ 



Finally, from ( 39 ) and ( 40 1 it follows that Gq {6) diverges as 9 approach 910(9^+ . 
2) First, observe that D C int (C). Since I? is a compact set, there exists s > 
such that 

Xiar^ >sl yeeD. (41) 

Define 

■= [l-tr(x(/)i?,,)]2 + Mxii)B,kW ^ " 
[M^k]r,s tT{QrBjk)tr{QsBjk) 

where Mj^ is a positive semidefinite matrix with rank equal to one. Accordingly, 

j.k 

> qJ2^ik[Mjk]r,.s + str[Qrx{e.r^Qs] 

j.k 

- I^^jk [Mjk]r,s + s'^tl[QrQs] 

3-k 

- <iy^Jjk[Mjk]r.s + S^{Qr,Qs)- 

3,k 
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Since {Qi}}^i are orthonormal matrices and SjkMjk > 0, we have that 

He_>qJ2 ^JhMjk + s'/ > s'/. (42) 

Notice that, Hg is continuous on int (C). Since D C int (C), it follows that Hg 
is continuous on the compact D. Hence, there exists 5 > such that Hg < SI 
y Be D. We conclude that Hg is coercive and bounded on D. 
3) Hg is continuous on D and \\Hg\\ < S V 9 E D, therefore Hg is Lipschitz 
continuous on _D. □ 



Since all the hypothesis of the Proposition |B.1| are satisfied, we have the 
following proposition. 

Proposition B.3 The sequence {^/};>o generated by the Newton algorithm of 
Section 



3.5 converges to the unique minimum point (f E int (C) of Gq. 
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